Skip to content

feat(network): IPv6/dual-stack monitors + gateway wiring (Node Doctor TaskForge feature 1125)#22

Merged
mattmattox merged 38 commits into
mainfrom
feat/ipv6-dualstack
Jun 25, 2026
Merged

feat(network): IPv6/dual-stack monitors + gateway wiring (Node Doctor TaskForge feature 1125)#22
mattmattox merged 38 commits into
mainfrom
feat/ipv6-dualstack

Conversation

@mattmattox

@mattmattox mattmattox commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Autonomous sequential implementation of the Node Doctor IPv6 / dual-stack backlog (TaskForge project 41, features 1124–1130). Every commit verified green: go build ./..., package tests (-short), go vet, gofmt; full go test ./... -short green at each feature boundary.

Scope completed (29 tasks)

  • New monitors (1125): network-ipv6-{sysctl,route,neighbor,firewall} — detection-only, self-registering, 93–95% coverage; registration verified in cmd.
  • Pinger/gateway/DNS/CNI polish (1124): gateway dual-stack addressFamily selection + metric-based default-route selection; pinger IPv6 link-local zone IDs, ICMPv6 filter + per-instance ID; httpPinger family pre-resolution; kubelet ::1 loopback fallback; DNS AAAA warnings; CNI family refresh-on-re-IP; dual-stack :: bind for exporter/health server; fixed flaky TestCorrelationDetectionFlow.
  • Metrics (1127): address_family label on network metrics; family-aware CNI dashboard + new 29-panel IPv6 dashboard; per-family + IPv6 PrometheusRule alerts.
  • Self-metrics (1126): Go/Process collectors verified; monitor-cycle, exporter-health, circuit-breaker-state, config-hot-reload self-metrics (decoupled observer pattern, no import cycles); authoritative registration test.
  • Remediation (1128): ip -6 route flush cache operation; verified flush-dns covers AAAA.
  • Tests/CI (1129): build-tagged IPv6 pinger integration tests; kind dual-stack integration test; dedicated ci-ipv6.yml workflow (Docker IPv6 + kind).
  • Config/helm/docs (1130): IPv6 monitors in default config + helm values/configmap; chart README + docs (configuration/monitors/remediation) IPv6 sections.

Not in this PR (tracked separately)

  • #17222 (register ipv6-route-flush remediator) is blocked by a discovered systemic gap: no built-in remediators are registered into the registry at all (registry.Remediate(strategy,…) fails for every type). Filed as node-doctor-236 / #19263 (high). The flush-ipv6-route operation is implemented + tested.
  • Pre-existing flaky freePort TOCTOU in prometheus tests filed as node-doctor-235 / #19260.

Per project policy this is not auto-merged.

Detection-only monitor (type network-ipv6-sysctl) reading
/proc/sys/net/ipv6/conf/{all,default,<iface>}/disable_ipv6 and emitting
IPv6SysctlMisconfigured when IPv6 is disabled but expected enabled.
Self-registers via init(); cmd wiring owned by #17209.

Adds table-driven unit tests (94.5% coverage) with t.TempDir() /proc
fixtures plus static testdata fixtures.
…(Task #17239)

Wire the landed detectDefaultIPv6Gateway parser into the gateway monitor.
New config key addressFamily (ipv4|ipv6|auto, default ipv4 preserves prior
behavior); getGatewayIP now returns the probed family and records it on
types.GatewayLatency.AddressFamily. auto prefers IPv4, falls back to IPv6.
Prometheus address_family label deferred to #17216.

Note: task's stated filesAffected (pkg/metrics/latency.go, pkg/events/
network.go) do not exist; GatewayLatency lives in pkg/types/types.go.
…#17206)

New detection-only monitor (type network-ipv6-route) that reuses the
landed detectDefaultIPv6GatewayFromFile parser to verify an IPv6 default
route is present when expected, emitting IPv6DefaultRouteMissing.
Distinguishes unreadable route table (warning, IPv6 may be absent) from
genuinely-absent default route. Self-registers via init(); cmd wiring
owned by #17209. Unit tests at 95.9% coverage.
…sk #17207)

New /proc-based detection-only monitor (type network-ipv6-neighbor)
parsing /proc/net/if_inet6 (link-local/global address presence) and
accept_ra/autoconf sysctls. Emits IPv6LinkLocalMissing,
IPv6GlobalAddressMissing, IPv6RouterAdvertisementDisabled. No new deps;
netlink NDP neighbor-cache documented as out-of-scope follow-up.
Self-registers via init() (#17209 owns cmd wiring). 93.2% coverage.
DETECTION-ONLY monitor (type network-ipv6-firewall) that lists ip6tables
(-S) or nft (list ruleset) read-only and flags IPv6FirewallBlackhole when
all built-in filter chains default-DROP with no ACCEPT anywhere. backend
config auto|ip6tables|nft. Missing binary / read error -> warning, never
an error; never issues a mutating firewall verb. Injectable command
executor for tests; 94.9% coverage. Self-registers (#17209 owns cmd).
The four IPv6 monitors (network-ipv6-{sysctl,route,neighbor,firewall})
self-register via the blank import of pkg/monitors/network and are
auto-enabled by ApplyDefaultMonitors since each provides a DefaultConfig
— no per-monitor wiring needed in cmd. This test pins that contract
(registration + default-application) so a future regression fails loudly.
Shipped-config additions are owned by #17228.
}

func (e *defaultCommandExecutor) Run(ctx context.Context, name string, args ...string) ([]byte, error) {
cmd := exec.CommandContext(ctx, name, args...)
// Scope 0x20 = link-local, 0x00 = global. Parse errors on individual lines are
// skipped; a read/open failure is returned to the caller.
func parseIfInet6File(path string) ([]ipv6Address, error) {
f, err := os.Open(path)
// readSysctlInt reads a sysctl-style file and returns its integer value (after
// trimming whitespace). Used for accept_ra / autoconf, which take values 0/1/2.
func readSysctlInt(path string) (int, error) {
data, err := os.ReadFile(path)
// (trimmed of whitespace) is "1". Any other value is treated as false. Errors
// are propagated.
func readSysctlBool(path string) (bool, error) {
data, err := os.ReadFile(path)
…(Task #17203)

Bind Prometheus exporter and health server to :: by default (accepts
IPv4+IPv6 when bindv6only=0), with graceful fallback to 0.0.0.0 when the
IPv6/dual-stack bind fails (nodes may have IPv6 disabled). Exporter bind
address is now configurable via prometheus.bindAddress (default ::);
replaced hardcoded 0.0.0.0 sprintf with net.JoinHostPort. Extracted
listenWithFallback/isDualStackHost helpers (unit-tested). Server inherits
caller ctx via BaseContext. cmd/main.go health server default -> ::.
…204)

When a kubelet healthz/metrics probe to a loopback host fails with a
connection-level (dial) error, retry once against the opposite-family
loopback ([::1] <-> 127.0.0.1) preserving scheme/port/path/query/auth.
Only loopback hosts are rewritten; HTTP 4xx/5xx and deadline timeouts do
not trigger fallback. Shared seam doRequestWithLoopbackFallback covers
both healthz and metrics. Helpers unit-tested; ::1 bind test skips when
IPv6 loopback unavailable.
…17233)

resolveTarget now parses fe80::1%eth0 (split on last %, retains zone) and
returns the zone; singlePing sends via net.IPAddr{IP,Zone} so link-local
pings carry the required scope. Also fix reply matching to compare IP
bytes (IP.Equal) instead of String(), so zoned replies (fe80::1%eth0)
match the zoneless target. Family stays ipv6 for link-local. Tests cover
zone parsing, destAddr threading, and zone-tolerant peer matching.
…234)

Pre-resolve hostname targets via the injectable Resolver (LookupIP),
select an address (IPv4-preferred, IPv6 if no v4) to set PingResult.Family
truthfully, and pin the dial to that IP via a per-target Transport
DialContext (original Host header preserved for TLS/vhost). Resolution
failure falls back to the original unpinned probe with empty family. IP
literals unchanged (resolver not consulted). Tests use a fake resolver.
…#17235)

Install ipv6.ICMPFilter accepting only EchoReply on the v6 listener
(non-fatal if unsupported; receive loop still filters). Replace the
process-global ICMP id with a per-pinger-instance id (pid mixed with an
atomic counter, masked to 16 bits, non-zero) so concurrent pingers don't
cross-match replies. Live-socket integration test stays gated by
testing.Short(). Adds id-uniqueness unit test.
// is false. The label argument ("healthz" or "metrics") is used only for
// logging.
func (c *defaultKubeletClient) doRequestWithLoopbackFallback(req *http.Request, label string) (resp *http.Response, usedFallback bool, err error) {
resp, err = c.client.Do(req)

// Rebuild the request against the alternate loopback, preserving method,
// context, and headers (including any auth header already applied).
fbReq, buildErr := http.NewRequestWithContext(req.Context(), req.Method, fallbackURL, nil)
log.Printf("[INFO] kubelet %s: loopback probe to %s failed (%v), retrying %s",
label, req.URL.Host, primaryErr, fbReq.URL.Host)

resp, err = c.client.Do(fbReq)
// the value is deterministic and unique per instance (no randomness required).
func nextPingerID() uint16 {
n := atomic.AddUint32(&pingerInstanceCounter, 1)
id := uint16(os.Getpid()) + uint16(n)
// the value is deterministic and unique per instance (no randomness required).
func nextPingerID() uint16 {
n := atomic.AddUint32(&pingerInstanceCounter, 1)
id := uint16(os.Getpid()) + uint16(n)
id := uint16(os.Getpid()) + uint16(n)
if id == 0 {
// Avoid an all-zero ID, which is a poor discriminator on the wire.
id = uint16(n) | 0x8000
}
fbReq.Header = req.Header.Clone()

log.Printf("[INFO] kubelet %s: loopback probe to %s failed (%v), retrying %s",
…ing (Task #17236)

Add NODE_DOCTOR_ICMP_INTEGRATION env gate: when set (CI), socket/
permission failures in TestDefaultPinger_Integration become hard failures
instead of silent t.Skip; -short still skips, unset+non-short still
skips gracefully on unprivileged dev boxes. New Makefile target
test-net-icmp-integration and a separate (non-blocking) CI job compile
the test binary as the runner user and run it under sudo for CAP_NET_RAW.
Add TestResolveTarget_HostnameDNSPath (localhost via /etc/hosts, no
network) exercising the net.ParseIP-fails -> net.LookupIP branch with
loopback/family/zone assertions, plus TestResolveTarget_HostnameResolution
Failure using an RFC 6761 .invalid name for the error branch. Test-only.
Both default-route parsers now scan all default routes and select the
lowest-metric one (kernel behavior) instead of first-seen. IPv4 metric =
/proc/net/route col 6 decimal; IPv6 metric = /proc/net/ipv6_route col 5
hex. Tie -> first-seen; malformed metric -> treated as max (line still
eligible if sole default). New dedicated multi-default fixtures (shared
ipv6_route fixture untouched to protect #17239 tests).
…241)

When a custom DNS query is RecordType AAAA with TestEachNameserver and/or
ConsistencyCheck enabled (both A-only), emit a single Warning event
(AAAAFeatureUnsupported) per query naming the skipped feature(s) and
domain, instead of silently dropping them. AAAA lookup behavior and the
A-query path are unchanged. Tests cover each flag, both, neither, and A.
Add TestCheckCustomQueries_AAAAScopedAddresses asserting the AAAA custom-
query path treats scoped/non-global IPv6 results (fe80::/10 link-local,
fc00::/7 ULA, ::1 loopback, and a scoped+global mix) as a successful
lookup emitting no error/no-records events. Test-only.
…(Task #17243)

Add TestCheckCustomQueries_AAAARecordTypeInEventMessage asserting the
failure, no-records, and high-latency events for AAAA custom queries
include the "AAAA" record-type string in their message. Test-only.
The #17202 carry-forward made PeerStatus.Family monotonic-sticky: a fresh
probe resolving a different family (peer re-IP v4<->v6, overlay-test
toggle) was ignored. Now the most recent check's resolved family wins;
the prior family is retained only when a check resolves no family at all
(transient failure / pinger error early-return), preserving #17202
behavior. Tests cover both re-IP directions + carry-forward boundary.
…s (Task #17232)

Root cause: the test started the controller server on hardcoded 0.0.0.0:
8080 only to run the correlator's background loop; the bind collided with
a port already in use, and two subtests ignored the Start() error so the
correlator loop never ran -> no correlations -> misleading assertion
failure. Fix (test-only): use an ephemeral port (Port=0), assert
Start() succeeds in every subtest, and replace fixed sleeps with the
Eventually poll helper. Assertions unchanged.
Add address_family label (ipv4|ipv6|unknown) to GatewayLatencySeconds,
Peer{Latency,LatencyAvg,Reachable}, and DNSLatencySeconds. Add
AddressFamily to types.{PeerLatency,DNSLatency,APIServerLatency}
(GatewayLatency already had it) and populate from PeerStatus.Family,
DNS RecordType (AAAA->ipv6), and the classified API-server endpoint.
familyLabel() normalizes empty/unexpected -> unknown so the label is
never empty. No import cycle (monitors use existing family consts;
types untouched re imports). Tests for label emission + classifier.
…ls (Task #17217)

Add an address_family query template variable (multi/includeAll) to the
CNI network-health dashboard and apply address_family=~"$address_family"
to the peer latency/reachability panels, grouping the per-node timeseries
by (node, address_family) so IPv4 vs IPv6 series are distinguishable. The
overview/system/kubernetes dashboards have no relabeled-metric panels and
are unchanged.
…k #17218)

Add dashboards/node-doctor-ipv6.json (29 panels): IPv6 monitor condition
health (condition_status{condition_type=~"IPv6.*"} for the six new
conditions), condition activity over time, dual-stack connectivity
(peer/gateway latency + reachability filtered by address_family, A-vs-AAAA
DNS), and IPv6 monitor-up/events. Mirrors the CNI dashboard schema/
templating; all PromQL uses verified metric+label names.
… (Task #17219)

Rewrite NodeDoctorHighPeerLatency and NodeDoctorLowPeerConnectivity to
aggregate by (node, address_family) using the family-labeled peer gauges,
so IPv4/IPv6 alert independently and the firing alert carries
address_family. Add NodeDoctorIPv6Misconfigured firing on
condition_status{condition_type=~"IPv6.*"}==1 (for 10m, address_family=
ipv6 label), gated by prometheusRule.warning.ipv6Misconfigured.enabled
(default true). Mirror the two rewrites + new alert into the static
deployment/prometheusrule.yaml. helm template renders; YAML valid.
…Task #17210)

NewRegistry (the registry the exporter serves) already registers the Go
runtime and process collectors. Add TestNewRegistry_IncludesGoAndProcess
Collectors asserting go_goroutines and (on linux) process_* /
process_start_time_seconds are exposed, so the runtime/process self-
metrics can't silently regress. Test-only; functionality already present.
Add monitor_cycles_total{node,monitor_name,result} and
monitor_cycle_last_timestamp_seconds{node,monitor_name} (last-run
heartbeat), recorded once per cycle in ExportStatus via new
RecordMonitorCycle. Also wire the previously-defined-but-unobserved
monitor_check_duration_seconds histogram. result=error when the status
carries any ConditionFalse. Tests for the recorder + ExportStatus path.
Add exporter_healthy, exporter_last_success_timestamp_seconds, and
exporter_consecutive_failures gauges ({node,exporter}), updated via a
recordExportHealth helper at the success/error branches of both
ExportStatus and ExportProblem. Consecutive failures tracked in a
mutex-guarded field mirrored to the gauge. Validation-failure branches
now also record ExportErrorsTotal (previously had no production site).
Tests cover success/failure/reset. Race-clean.
Expose the remediator circuit-breaker state as gauge
remediator_circuit_breaker_state{node} (0=closed,1=open,2=half-open) via
a push-observer: remediators defines CircuitStateObserver (int-based, no
prometheus import), the registry notifies it on every transition and on
SetCircuitStateObserver (current-state push); the prometheus exporter
implements ObserveCircuitState to set the gauge; main wires them when
both exist. No import cycle. Tests on both sides.
Add config_reloads_total{node,result}, config_reload_last_timestamp_
seconds, config_reload_last_success, config_reload_duration_seconds.
ReloadCoordinator gains an injected ReloadMetricsRecorder func (no
prometheus import); performReload uses a named return + deferred closure
so every return path (load/validation/callback error, no-op, success)
records exactly once. Exporter.RecordConfigReload implements it; wired in
main via det.SetReloadMetricsRecorder(promExporter.RecordConfigReload).
Tests on both sides; no import cycle.
…215)

Add TestSelfMetricsRegistered: registers Metrics into NewRegistry (incl.
go/process collectors), populates every self-metric via its recorder
(RecordMonitorCycle/recordExportHealth/ObserveCircuitState/
RecordConfigReload + direct export-ops Inc/Observe), gathers, and asserts
all 14 node_doctor_* self-metric families plus go_goroutines and (linux)
process_start_time_seconds are present. Test-only.
Add NetworkFlushIPv6Route ("flush-ipv6-route") operation to
NetworkRemediator: runs ip -6 route flush cache, optional ip -6 route show
backup when BackupRouting. Unlike reset-routing, a flush failure is a hard
error (dedicated flush remediator). verifyOperation is a no-op (immediate).
Registration/enablement left to #17222. Tests assert exact command args,
failure semantics, backup ordering, and verify path.
Document and test that flushDNS is address-family agnostic: resolvectl
flush-caches / systemd-resolve --flush-caches clear the full resolver
cache (A and AAAA), so no separate IPv6 DNS-flush op is needed. New
TestNetworkRemediator_FlushDNS_CoversIPv6 asserts the exact full-cache
flush command for both backends and that no family/type-restricting flag
(-4/-6/--type) is passed.
Add pinger_ipv6_integration_test.go (//go:build integration) exercising
the real defaultPinger against ::1 and a discovered link-local target,
asserting FamilyIPv6 classification; skips cleanly when IPv6 loopback or
CAP_NET_RAW is unavailable. Complements the untagged platform-agnostic
v6 unit tests (zone/family/destAddr/peer-match) in pinger_test.go. The
-short unit tier never opens raw sockets.
…226)

Add test/integration/testdata/kind-dualstack.yaml (networking.ipFamily:
dual, dual pod/service subnets) and test/integration/ipv6_kind_test.go
(//go:build integration): creates a uniquely-named dual-stack kind
cluster, waits for node readiness, and asserts dual-stack — node PodCIDRs
+ InternalIPs carry both families and a RequireDualStack Service gets
both an IPv4 and IPv6 ClusterIP (fails on single-stack). Skips cleanly
when short/kind-missing/docker-down; t.Cleanup always deletes the
cluster; isolated temp kubeconfig.
Add .github/workflows/ci-ipv6.yml with two jobs: ipv6-icmp (compile
network tests with -tags=integration, run the v6 ICMP tests under sudo +
NODE_DOCTOR_ICMP_INTEGRATION for CAP_NET_RAW) and ipv6-kind-dualstack
(enable Docker daemon IPv6 with ip6tables+experimental — required for
kind dual-stack — install pinned kind, run TestIPv6KindDualStackCluster
with 20m timeout + always-cleanup). Path-filtered to IPv6-relevant dirs,
workflow_dispatch, concurrency cancel. Action/Go versions match ci.yml.
…17228)

Add explicit ipv6-sysctl-check, ipv6-route-check, ipv6-neighbor-check,
and ipv6-firewall-check entries (matching each monitor's DefaultConfig)
to config/node-doctor.yaml, with a note that they are detection-only and
degrade gracefully on IPv4-only nodes (set expectIPv6Enabled/enabled
false to silence). Validated via --validate-config.
…7229)

Add ipv6Sysctl/ipv6Route/ipv6Neighbor/ipv6Firewall monitor blocks to
values.yaml(.template) and render them in configmap.yaml (each gated by
its .enabled), using procPath /host/proc (host fs bind-mount). Switch
exporters.http.bindAddress default 0.0.0.0 -> :: (dual-stack; server
falls back to 0.0.0.0 when IPv6 is disabled). helm template renders;
rendered config contains the four network-ipv6-* types and bindAddress "::".
Document the four detection-only IPv6 monitors (ipv6Sysctl/Route/Neighbor/
Firewall), the dual-stack :: bindAddress default with IPv4 fallback, how
they degrade gracefully on IPv4-only nodes (and how to disable them), and
the address_family metric label + IPv6 PrometheusRule alerts.
… (Task #17231)

configuration.md: IPv6/dual-stack subsection (4 detection monitors with
config keys, gateway addressFamily, DNS recordType: AAAA, :: bind +
fallback, IPv4-only degradation/disable). monitors.md: section + TOC for
the four network-ipv6-* monitors (checks, keys, conditions) + address_
family metric label. remediation.md: flush-ipv6-route operation (ip -6
route flush cache) and family-agnostic flush-dns note. Verified all type
strings/conditions/keys against source; did not claim a registered
ipv6-route-flush remediator TYPE (that registration is blocked, #19263).
@mattmattox mattmattox marked this pull request as ready for review June 25, 2026 10:50

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f0362a84f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +249 to +250
if cond.Status == types.ConditionFalse {
return true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat active problem conditions as failed cycles

Several monitors use problem-style conditions where True means the problem is active and False means it is cleared (for example the new IPv6 monitors emit IPv6DefaultRouteMissing=True/IPv6FirewallBlackhole=True, and existing NetworkPartitioned=True follows the same pattern). With this check, those active failures are recorded as monitor_cycles_total{result="success"}, while healthy/cleared False states are recorded as errors, so the new monitor-cycle self-metrics are inverted for these monitors.

Useful? React with 👍 / 👎.

// One ExportStatus call corresponds to one completed monitor check cycle:
// the monitor ran its check and emitted a status, which the detector forwards
// here. Time the cycle and record self-metrics at the end via RecordMonitorCycle.
cycleStart := time.Now()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Measure check duration before exporting status

ExportStatus is called from processStatus only after the monitor check has already completed and its Status has been received, so starting cycleStart here measures Prometheus export processing time rather than the monitor check duration. Slow monitor executions will therefore look fast in monitor_check_duration_seconds, making the new self-metric misleading for latency alerts and dashboards.

Useful? React with 👍 / 👎.

Comment thread pkg/health/server.go
Comment on lines +123 to +124
if err == nil {
return ln, nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bind IPv4 when :: succeeds as IPv6-only

On nodes with net.ipv6.bindv6only=1, binding [::]:port succeeds but accepts only IPv6 connections; because this branch returns immediately, the IPv4 fallback never starts. Since this commit changes the health/metrics defaults to ::, IPv4 Kubernetes probes or Prometheus scrapes fail on those hardened nodes even though the code promises a dual-stack bind with IPv4 fallback.

Useful? React with 👍 / 👎.

Comment on lines +24 to +25
ip := net.ParseIP(host)
return ip != nil && ip.To4() == nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit IPv4 fallback to wildcard IPv6 binds

If an operator configures a concrete IPv6 bindAddress such as fd00::10, any bind error (address typo, address not assigned, etc.) is classified as dual-stack and retried on 0.0.0.0, silently exposing the metrics endpoint on every IPv4 interface instead of failing the bad explicit bind. The fallback should be restricted to the empty/unspecified wildcard address, not all IPv6 literals.

Useful? React with 👍 / 👎.

@mattmattox mattmattox merged commit b804f3b into main Jun 25, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants